Starting a Dialog between Model Checking and Fault-tolerant Distributed Algorithms

نویسندگان

  • Annu John
  • Igor Konnov
  • Ulrich Schmid
  • Helmut Veith
  • Josef Widder
چکیده

Fault-tolerant distributed algorithms are central for building reliable spatially distributed systems. Unfortunately, the lack of a canonical precise framework for fault-tolerant algorithms is an obstacle for both verification and deployment. In this paper, we introduce a new domainspecific framework to capture the behavior of fault-tolerant distributed algorithms in an adequate and precise way. At the center of our framework is a parameterized system model where control flow automata are used for process specification. To account for the specific features and properties of fault-tolerant distributed algorithms for message-passing systems, our control flow automata are extended to model threshold guards as well as the inherent non-determinism stemming from asynchronous communication, interleavings of steps, and faulty processes. We demonstrate the adequacy of our framework in a representative case study where we formalize a family of well-known fault-tolerant broadcasting algorithms under a variety of failure assumptions. Our case study is supported by model checking experiments with safety and liveness specifications for a fixed number of processes. In the experiments, we systematically varied the assumptions on both the resilience condition and the failure model. In all cases, our experiments coincided with the theoretical results predicted in the distributed algorithms literature. This is giving clear evidence for the adequacy of our model. In a companion paper [18], we are addressing the new model checking techniques necessary for parametric verification of the distributed algorithms captured in our framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms

Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems....

متن کامل

Challenges in Model Checking of Fault-tolerant Designs in TLA

Although, historically, fault tolerance is connected to safetycritical systems, there has been an increasing interest in fault tolerance in mainstream application such as the cloud. There is a need for formal specification and verification of industrial fault-tolerant designs, since they integrate, in a non-trivial way, the ideas from distributed algorithms, whose correctness is usually based o...

متن کامل

What You Always Wanted to Know About Model Checking of Fault-Tolerant Distributed Algorithms

Distributed algorithms have numerous mission-critical applications in embedded avionic and automotive systems, cloud computing, computer networks, hardware design, and the internet of things. Although distributed algorithms exhibit complex interactions with their computing environment and are difficult to understand for human engineers, computer science has developed only very limited tool supp...

متن کامل

Tutorial on Parameterized Model Checking of Fault-Tolerant Distributed Algorithms

Recently we introduced an abstraction method for parameterized model checking of threshold-based fault-tolerant distributed algorithms. We showed how to verify distributed algorithms without fixing the size of the system a priori. As is the case for many other published abstraction techniques, transferring the theory into a running tool is a challenge. It requires understanding of several verif...

متن کامل

Verification of Fault-Tolerant Protocols with Sally

Sally is a model checker for infinite-state systems that implements several verification algorithms, including a variant of IC3/PDR called Property-Directed K-induction. We present an application of Sally to automated verification of fault-tolerant distributed algorithms.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1210.3839  شماره 

صفحات  -

تاریخ انتشار 2012